14 research outputs found
Multi-Agent Programming Contest 2011 - The Python-DTU Team
We provide a brief description of the Python-DTU system, including the
overall design, the tools and the algorithms that we plan to use in the agent
contest.Comment: 4 page
Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing
Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let n, and z denote the size of the input string, and the compressed LZ77 string, respectively. We obtain the following time-space trade-offs. Given a pattern string P of length m, we can solve the problem in
(i) O(m + occ lglg n) time using O(z lg(n/z) lglg z) space, or
(ii) O(m(1 + lg^e z / lg(n/z)) + occ(lglg n + lg^e z)) time using O(z lg(n/z)) space, for any 0 < e < 1
In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lglg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1+lg^e z / lg(n/z))). However, for any polynomial compression ratio, i.e., z = O(n^{1-d}), for constant d > 0, this becomes O(m). Our index also supports extraction of any substring of length l in O(l + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search
Fast Dynamic Arrays
We present a highly optimized implementation of tiered vectors, a data
structure for maintaining a sequence of elements supporting access in time
and insertion and deletion in time for
while using extra space. We consider several different implementation
optimizations in C++ and compare their performance to that of vector and
multiset from the standard library on sequences with up to elements. Our
fastest implementation uses much less space than multiset while providing
speedups of for access operations compared to multiset and speedups
of compared to vector for insertion and deletion operations
while being competitive with both data structures for all other operations
Fast Dynamic Arrays
We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of n elements supporting access in time O(1) and insertion and deletion in time O(n^e) for e > 0 while using o(n) extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and set from the standard library on sequences with up to 10^8 elements. Our fastest implementation uses much less space than set while providing speedups of 40x for access operations compared to set and speedups of 10.000x compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations
Compressed Indexing with Signature Grammars
The compressed indexing problem is to preprocess a string of length
into a compressed representation that supports pattern matching queries. That
is, given a string of length report all occurrences of in .
We present a data structure that supports pattern matching queries in time using space where
is the size of the LZ77 parse of and is an arbitrarily small
constant, when the alphabet is small or for any
constant . We also present two data structures for the general
case; one where the space is increased by , and one where the
query time changes from worst-case to expected. These results improve the
previously best known solutions. Notably, this is the first data structure that
decides if occurs in in time using space.
Our results are mainly obtained by a novel combination of a randomized
grammar construction algorithm with well known techniques relating pattern
matching to 2D-range reporting
Time-space trade-offs for lempel-ziv compressed indexing
Given a string , the \emph{compressed indexing problem} is to preprocess
into a compressed representation that supports fast \emph{substring
queries}. The goal is to use little space relative to the compressed size of
while supporting fast queries. We present a compressed index based on the
Lempel--Ziv 1977 compression scheme. We obtain the following time-space
trade-offs: For constant-sized alphabets; (i) time using
space, or (ii) time using space. For integer
alphabets polynomially bounded by ; (iii) time using space, or (iv) time using
space, where and are the length of
the input string and query string respectively, is the number of phrases in
the LZ77 parse of the input string, is the number of occurrences of the
query in the input and is an arbitrarily small constant. In
particular, (i) improves the leading term in the query time of the previous
best solution from to at the cost of increasing the space by
a factor . Alternatively, (ii) matches the previous best space
bound, but has a leading term in the query time of . However, for any polynomial compression ratio, i.e., , for constant , this becomes . Our index
also supports extraction of any substring of length in time. Technically, our results are obtained by novel extensions and
combinations of existing data structures of independent interest, including a
new batched variant of weak prefix search
Optimal-Time Dictionary-Compressed Indexes
We describe the first self-indexes able to count and locate pattern
occurrences in optimal time within a space bounded by the size of the most
popular dictionary compressors. To achieve this result we combine several
recent findings, including \emph{string attractors} --- new combinatorial
objects encompassing most known compressibility measures for highly repetitive
texts ---, and grammars based on \emph{locally-consistent parsing}.
More in detail, let be the size of the smallest attractor for a text
of length . The measure is an (asymptotic) lower bound to the
size of dictionary compressors based on Lempel--Ziv, context-free grammars, and
many others. The smallest known text representations in terms of attractors use
space , and our lightest indexes work within the same
asymptotic space. Let be a suitably small constant fixed at
construction time, be the pattern length, and be the number of its
text occurrences. Our index counts pattern occurrences in
time, and locates them in time. These times already outperform those of most dictionary-compressed
indexes, while obtaining the least asymptotic space for any index searching
within time. Further, by increasing the space
to , we reduce the locating time to the
optimal , and within space we can
also count in optimal time. No dictionary-compressed index had obtained
this time before. All our indexes can be constructed in space and
expected time.
As a byproduct of independent interest..
Decompressing Lempel-Ziv Compressed Text
We consider the problem of decompressing the Lempel--Ziv 77 representation of
a string of length using a working space as close as possible to the
size of the input. The folklore solution for the problem runs in
time but requires random access to the whole decompressed text. Another
folklore solution is to convert LZ77 into a grammar of size and
then stream in linear time. In this paper, we show that time and
working space can be achieved for constant-size alphabets. On general
alphabets of size , we describe (i) a trade-off achieving
time and space for any
, and (ii) a solution achieving time and
space. The latter solution, in particular, dominates both
folklore algorithms for the problem. Our solutions can, more generally, extract
any specified subsequence of with little overheads on top of the linear
running time and working space. As an immediate corollary, we show that our
techniques yield improved results for pattern matching problems on
LZ77-compressed text